In probability theory and statistics, the Jensen–Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)[1] or total divergence to the average.[2] It is based on the Kullback–Leibler divergence, with the notable (and useful) difference that it is always a finite value. The square root of the Jensen–Shannon divergence is a metric.[3][4]
Contents |
Consider the set of probability distributions where A is a set provided with some σ-algebra of measurable subsets. In particular we can take A to be a finite or countable set with all subsets being measurable.
The Jensen–Shannon divergence (JSD) is a symmetrized and smoothed version of the Kullback–Leibler divergence . It is defined by
where
If A is countable, a more general definition, allowing for the comparison of more than two distributions, is:
where are the weights for the probability distributions and is the Shannon entropy for distribution . For the two-distribution case described above,
According to Lin (1991), the Jensen–Shannon divergence is bounded by 1.
Jensen-Shannon divergence is the mutual information between a random variable from a mixture distribution and a binary indicator variable where if is from and if is from .
It follows from the above result that Jensen-Shannon divergence is bounded by 0 and 1 because mutual information is non-negative and bounded by .
One can apply the same principle to the joint and product of marginal distribution (in analogy to Kullback-Leibler divergence and mutual information) and to measure how reliably one can decide if a given response comes from the joint distribution or the product distribution—given that these are the only possibilities.[5]
The generalization of probability distributions on density matrices allows to define quantum Jensen–Shannon divergence (QJSD).[6][7] It is defined for a set of density matrices and probability distribution as
where is the von Neumann entropy. This quantity was introduced in quantum information theory, where it is called the Holevo information: it gives the upper bound for amount of classical information encoded by the quantum states under the prior distribution (see Holevo's theorem) [8] Quantum Jensen–Shannon divergence for and two density matrices is a symmetric function, everywhere defined, bounded and equal to zero only if two density matrices are the same. It is a square of a metric for pure states [9] but it is unknown whether the metric property holds in general.[7]
For an application of the Jensen-Shannon Divergence in Bioinformatics and Genomic comparison see (Sims et al., 2009; Itzkovitz et al. 2010) and in protein surface comparison see (Ofran and Rost, 2003).